7.2 Coding
79
distant heliograph operator had difficulty in receiving the flashes reliably from the
sender, and it might therefore have been decided to repeat each flash three times
and the recipient would use majority selection on each group of three to deduce the
message. The capacity of the channel would thereby be lowered threefold.
In many practical cases, the physical medium for transmitting messages has to be
shared by many different messages. It is a great advantage of optical communications
that streams of photons of different wavelengths do not interfere with one another.
Therefore, an optical fibre can carry many independent signals. Inside a cell, in
which the cytoplasm is a shared medium, many different molecules are present and
independence is determined by the differential chemical affinities between pairs of
molecules.
7.2
Coding
Coding refers to the transduction of a message into another form. It is ubiquitous
in our world. Ideas are encoded into words, music, pictures, one language may be
encoded into another, and so on. We have already made extensive use of binary
coding; the compact disc-based recording industry today uses binary coding almost
exclusively for music, pictures, and words. Evidently any number can be written in
base 2; hence, a possible drill (algorithm) for binary coding consists of the following
steps:
1. Assign a number to each state to be encoded;
2. Convert that number into base 2.
A DNA sequence can thereby be converted into binary form by making the assign-
ments A right arrow 1→1, C right arrow 2→2, T right arrow 3→3, and G right arrow 4→4, which in base 2 are 1, 10, 11, and 100,
respectively. The coded sequence would have to be written (001, 010, etc.) and read
in groups of three digits, otherwise “AA” could be misinterpreted as “T” and so forth.
Alternatively, separators can be introduced (see also the Huffman code described near
the beginning of Sect. 7.4). The reading frame is thus defined as the series of groups
of three beginning with the first. DNA is an example of a usually nonoverlapping
code of contiguous triplets.
Codes may be written as transformations, e.g.,
↓ A
B
C
D
· · ·
Z
B
C
D
E
· · · A
,
which could also be written down compactly by the instruction “replace each letter
by the next one to the right” (sfqmbdf fbdi mfuufs cz uif ofyu pof up uif sjhiu). A
scheme for recoding DNA could be
↓ A
C
T
G
1
2
3
4